9 research outputs found

    A two-armed bandit based scheme for accelerated decentralized learning

    Get PDF
    The two-armed bandit problem is a classical optimization problem where a decision maker sequentially pulls one of two arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Bandit problems are particularly fascinating because a large class of real world problems, including routing, QoS control, game playing, and resource allocation, can be solved in a decentralized manner when modeled as a system of interacting gambling machines. Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. This paper proposes a novel scheme for decentralized decision making based on the Goore Game in which each decision maker is inherently Bayesian in nature, yet avoids computational intractability by relying simply on updating the hyper parameters of sibling conjugate priors, and on random sampling from these posteriors. We further report theoretical results on the variance of the random rewards experienced by each individual decision maker. Based on these theoretical results, each decision maker is able to accelerate its own learning by taking advantage of the increasingly more reliable feedback that is obtained as exploration gradually turns into exploitation in bandit problem based learning. Extensive experiments demonstrate that the accelerated learning allows us to combine the benefits of conservative learning, which is high accuracy, with the benefits of hurried learning, which is fast convergence. In this manner, our scheme outperforms recently proposed Goore Game solution schemes, where one has to trade off accuracy with speed. We thus believe that our methodology opens avenues for improved performance in a number of applications of bandit based decentralized decision making

    A Stochastic Search on the Line-Based Solution to Discretized Estimation

    Get PDF
    Recently, Oommen and Rueda [11] presented a strategy by which the parameters of a binomial/multinomial distribution can be estimated when the underlying distribution is nonstationary. The method has been referred to as the Stochastic Learning Weak Estimator (SLWE), and is based on the principles of continuous stochastic Learning Automata (LA). In this paper, we consider a new family of stochastic discretized weak estimators pertinent to tracking time-varying binomial distributions. As opposed to the SLWE, our proposed estimator is discretized , i.e., the estimate can assume only a finite number of values. It is well known in the field of LA that discretized schemes achieve faster convergence speed than their corresponding continuous counterparts. By virtue of discretization, our estimator realizes extremely fast adjustments of the running estimates by jumps, and it is thus able to robustly, and very quickly, track changes in the parameters of the distribution after a switch has occurred in the environment. The design principle of our strategy is based on a solution, pioneered by Oommen [7], for the Stochastic Search on the Line (SSL) problem. The SSL solution proposed in [7], assumes the existence of an Oracle which informs the LA whether to go “right” or “left”. In our application domain, in order to achieve efficient estimation, we have to first infer (or rather simulate ) such an Oracle. In order to overcome this difficulty, we rather intelligently construct an “Artificial Oracle” that suggests whether we are to increase the current estimate or to decrease it. The paper briefly reports conclusive experimental results that demonstrate the ability of the proposed estimator to cope with non-stationary environments with a high adaptation rate, and with an accuracy that depends on its resolution. The results which we present are, to the best of our knowledge, the first reported results that resolve the problem of discretized weak estimation using a SSL-based solution

    Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

    Full text link
    The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.Comment: 15 pages, 2 figures, submitted to ALT (Algorithmic Learning Theory

    Solving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filters

    Get PDF
    The multi-armed bandit problem is a classical optimization problem where an agent sequentially pulls one of multiple arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Dynamically changing (non-stationary) bandit problems are particularly challenging because each change of the reward distributions may progressively degrade the performance of any fixed strategy. Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. This paper proposes a novel solution scheme for bandit problems with non-stationary normally distributed rewards. The scheme is inherently Bayesian in nature, yet avoids computational intractability by relying simply on updating the hyper parameters of sibling Kalman Filters, and on random sampling from these posteriors. Furthermore, it is able to track the better actions, thus supporting non-stationary bandit problems. Extensive experiments demonstrate that our scheme outperforms recently proposed bandit playing algorithms, not only in non-stationary environments, but in stationary environments also. Furthermore, our scheme is robust to inexact parameter settings. We thus believe that our methodology opens avenues for obtaining improved novel solutions

    A Bayesian learning automata-based distributed channel selection scheme for cognitive radio networks

    No full text
    We consider a scenario where multiple Secondary Users (SUs) operate within a Cognitive Radio Network (CRN) which involves a set of channels, where each channel is associated with a Primary User (PU). We investigate two channel access strategies for SU transmissions. In the first strategy, the SUs will send a packet directly without operating Carrier Sensing Medium Access/Collision Avoidance (CSMA/CA) whenever a PU is absent in the selected channel. In the second strategy, the SUs implement CSMA/CA to further reduce the probability of collisions among co-channel SUs. For each strategy, the channel selection problem is formulated and demonstrated to be a so-called "Potential" game, and a Bayesian Learning Automata (BLA) has been incorporated into each SU so to play the game in such a manner that the SU can adapt itself to the environment. The performance of the BLA in this application is evaluated through rigorous simulations. These simulation results illustrate the convergence of the SUs to the global optimum in the first strategy, and to a Nash Equilibrium (NE) point in the second

    Scalable Independent Multi-level Distribution in Multimedia Content Analysis

    No full text
    Due to the limited processing resources available on a typical host, monolithic multimedia content analysis applications are often restricted to simple content analysis tasks, covering a small number of media streams. This limitation on processing resources can often be reduced by parallelizing and distributing an application, utilizing the processing resources on several hosts. However, multimedia content analysis applications consist of multiple logical levels, such as streaming, filtering, feature extraction, and classification. This complexity makes parallelization and distribution a difficult task, as each logical level may require special purpose techniques. In this paper we propose a component-based framework where each logical level can be parallelized and distributed independently

    Pression foncière, monétarisation et individualisation des systèmes de production en zone cotonnière au Togo

    Get PDF
    Pression foncière accrue et monétarisation des échanges transforment les systèmes de production : fixation de l'agriculture, baisse des rendements et de la productivité du travail, migrations accentuées. Le développement des cultures de rapport se fait par une augmentation de la surface cultivée par actif et favorise une simplification des systèmes de culture. Identification de stratégies paysannes diverse

    Generalized Bayesian pursuit: A novel scheme for multi-armed Bernoulli bandit problems

    Get PDF
    In the last decades, a myriad of approaches to the multi-armed bandit problem have appeared in several different fields. The current top performing algorithms from the field of Learning Automata reside in the Pursuit family, while UCB-Tuned and the ε -greedy class of algorithms can be seen as state-of-the-art regret minimizing algorithms. Recently, however, the Bayesian Learning Automaton (BLA) outperformed all of these, and other schemes, in a wide range of experiments. Although seemingly incompatible, in this paper we integrate the foundational learning principles motivating the design of the BLA, with the principles of the so-called Generalized Pursuit algorithm (GPST), leading to the Generalized Bayesian Pursuit algorithm (GBPST). As in the BLA, the estimates are truly Bayesian in nature, however, instead of basing exploration upon direct sampling from the estimates, GBPST explores by means of the arm selection probability vector of GPST. Further, as in the GPST, in the interest of higher rates of learning, a set of arms that are currently perceived as being optimal is pursued to minimize the probability of pursuing a wrong arm. It turns out that GBPST is superior to GPST and that it even performs better than the BLA by controlling the learning speed of GBPST. We thus believe that GBPST constitutes a new avenue of research, in which the performance benefits of the GPST and the BLA are mutually augmented, opening up for improved performance in a number of applications, currently being tested

    Using Theory of Mind to assess users' sense of agency in social chatbots

    No full text
    The technological advancements in the field of chatbot research is booming. Despite this, it is still difficult to assess which social characteristics a chatbot needs to have for the user to interact with it as if it had a mind of its own. Review studies have highlighted that the main cause is the low number of research papers dedicated to this question, and the lack of a consistent protocol within the papers that do address it. In the current paper, we suggest the use of a Theory of Mind task to measure the implicit social behaviour users exhibit towards a text-based chatbot. We present preliminary findings suggesting that participants adapt towards this basic chatbot significantly more than when they conduct the task alone (p < .017). This task is quick to administer and does not require a second chatbot for comparison, making it an efficient universal task. With it, a database could be built with scores of all existing chatbots, allowing fast and efficient meta-analyses to discover which characteristics make the chatbot appear more 'human'
    corecore